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Nested Effects Models (NEMs) are a class of graphical models in- 
m troduced to analyze the results of gene perturbation screens. NEMs 

explore noisy subset relations between the high-dimensional outputs of 
phenotyping studies, e.g. the effects showing in gene expression profiles 
or as morphological features of the perturbed cell. 

In this paper we expand the statistical basis of NEMs in four direc- 
tions: First, we derive a new formula for the likelihood function of a 
NEM, which generalizes previous results for binary data. Second, we 
prove model identifiability under mild assumptions. Third, we show 
that the new formulation of the likelihood allows to efficiently traverse 
model space. Fourth, we incorporate prior knowledge and an automated 
variable selection criterion to decrease the influence of noise in the data. 
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1 Introduction 



Functional genomics has a long tradition of inferring the inner working of 
a cell through analysis of its response to various perturbations. There are 
several perturbation techniques suitable for large-scale analysis in different 
organisms. Experiments with gene knock-outs have been very successful in 
uncovering gene function ( Hughes et al. , 2000 ) , and gene silencing by RNA 



1998) allows perturbation screens on a genome-wide 



interference (Fire et al. 
scale. 

The changes observed in the cell are called the phenotype of the perturba- 
tion. In most biological studies, perturbation effects are measured by single 
reporters like cell death or growth. Analysis of these phenotypes can reveal 
which genes are essential for an organism (Boutros et al. , 2004) or for a partic- 



ular pathway (Gesellchen et al. 2005). However, these screens do not reveal 



how the genes contribute to regulatory networks or signalling pathways. 

More details about gene function and interactions are contained in high- 
dimensional phenotypes that give a global view of changes in the cell. High- 



dimensional phenotypes include gene expression profiles (Hughes et al. 2000 



Boutros et al. , 2002 Driessche et al. , 2005), metabolite concentrations (Raams- 



donk et al. , 2001), sensitivity to cytotoxic or cytostatic agents (Brown et al. 



2006), or morphological features of the cell (Ohya et al. 2005). While high- 



dimensional phenotypic profiles promise a comprehensive view of the function 
of genes in a cell, only limited work has been done so far to adapt statisti- 
cal and computational methodologies to the specific needs of large-scale and 
high-dimensional phenotyping screens. 



Phenotypic profiles offer only indirect information A key obstacle to 
inferring genetic networks from high-dimensional perturbation screens is that 
phenotypic profiles generally offer only indirect information on how genes in- 
teract. Cell morphology or sensitivity to stresses are global features of the 
cell, which are hard to relate directly to the genes contributing to them. Gene 
expression phenotypes also offer an indirect view of pathway structure due to 
the high number of post-transcriptional regulatory events like protein modifi- 
cations. For example, when silencing a kinase we might not be able to observe 
changes in the activation states of other proteins involved in the pathway. The 
only information we may get is that genes downstream of the pathway show 
expression changes. Thus, phenotypic profiles may provide only an indirect 
view of information flow and pathway structure in the cell. 



Statistical analysis of phenotyping screens Previous work focused on 
clustering phenotypic profiles to find groups of genes that show similar effects 
when perturbed. The rationale is that genes with similar perturbation effects 
are expected to be functionally related. The most prominent method used is 



average linkage hierarchical clustering (Piano et al. 2002 Ohya et al. 2005). 



A complementary approach is ranking genes according to similarity with a 
query gene (Gunsalus et al. 2004). In a supervised setting, first steps have 



been taken to classify genes into functional groups based on phenotypic profiles 



(Ohya et al. 2005). A comprehensive overview of computational models for 



the reconstruction of genetic networks can be found in (Markowetz and Spang 



2007). 



A recent approach especially designed to learning from indirect informa- 



tion and high-dimensional phenotypes are Nested Effects Models (Markowetz 



et al. , 2005, 2007) that reconstruct features of the internal organization of the 
cell from the nested structure of observed perturbation effects. Perturbing 
some genes may have an influence on a global process, while perturbing others 
affects sub-processes of it. Imagine, for example, a signaling pathway acti- 
vating several transcription factors. Blocking the entire pathway will affect 
all targets of the transcription factors, while perturbing a single downstream 
transcription factor will only affect its direct targets, which are a subset of the 
phenotype obtained by blocking the complete pathway. NEMs can be seen as 
a generalization of similarity-based clustering, which orders (clusters of) genes 
according to subset relationships between the sets of phenotypes. So far, a 
likelihood function has been derived for NEMs in the case of discretized or 



binary data (Markowetz et al. , 2005) and p- values of differential expression 
(Frohlich et al. 2007a). For model inference, divide- and- conquer strategies 



have been applied to scale up model search (Markowetz et al. 2007 Frohlich 



et al., 2007b) 



Overview of this paper After introducing a generalized version of NEMs 
in Section [2] we expand their statistical basis in four directions: First, we 
derive a new formula for the likelihood function of a NEM that generalizes 
previous results (Section [3]). Second, we prove model identifiability under mild 
assumptions (Section El). Third, we develop efficient methods of traversing 
model space (Section pj). And finally, we incorporate prior knowledge and a 
variable selection step into model search to decrease the influence of noise in 
the data (Section [6]). We show the applicability of the proposed method in the 
controlled setting of a simulation scenario (Section [7]) and in an application to 
an example in Drosophila immune response (Section [8]). 



2 Definition of nested effects models 



The system of components we consider consists of a set O of no observable 
entities (e.g. mRNA concentrations), and a set A of actions (i.e. gene 
perturbations) applied to the system which are expected to alter the state 
of some observable entities. Both O and A consist of binary variables. An 
altered state of an observable s G O is denoted by s = 1, the basic state 
is s = 0. A value of a = 1, resp. a = 0, means that action a G A was 
performed, resp. not performed. Let D as , (a, s) G M. C i x be a set 
of measurements for observation s after performing action a. The set of all 
measurements D = {D as \(a,s) G Ai} constitutes the data. Note that our 
definition of M. does not require that all s G O are observed for all actions 
a G A. Thus, missing data and the exclusion of failed experiments can directly 
be incorporated into all the results that we develop in the following. 

Definition 1. A (general) effects model is a binary x no matrix F that 
determines the state of the observable s when action a is performed, an entry 
indicating no change, 1 indicating a change. 

Nested effects models are effects models that can be defined in terms of two 
graphs or adjacency matrices. The first graph, T, describes how actions imply 
each other and the second graph, O, how observables are linked to actions. 
Let the actions graph T = (T aa r) be a graph on the vertices A , encoded as an 
n A x n A adjacency matrix with the convention r aa = 1, a G A. We say that 
the edge a — > a' is in T, or for short a — > a', if T aa / = 1. 

Secondly, we assume that each observation is directly linked to exactly one 
action as defined by a function 6 : O — > A. This can synonymously be encoded 
as an x n adjacency matrix 6 = (G os ), with Q as = 5 n =6>(s) f° r a E A, 

s G O (where 5. is the delta function). Write a s if Q as = 1. By this 
definition, contains only zeros except for a single 1 in each column. When 
describing how observables are linked to actions, we tacitly switch between the 
adjacency matrix and the function 9 for the sake of notational convenience. 

We postulate an effect of an action a G A on s G O if and only if there 
exists an action a! G A such that the edge from a to a 1 is in T, and s is directly 
linked to a' (the edge from a' to s is in 0). Since each observable is linked to 
exactly one action, action a has an effect on s if and only if (r©) as = 1. This 
prompts the following definition: 

Definition 2. A nested effects model (NEM) F is an effects model which can 
be represented as a product of Y and as defined above: 

f = re. (2.i) 
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Figure 1: NEM examples. Each plot shows a NEM on A = {X,Y,Z} and 
twelve observables O depicted as dots. The solid arrows define the adjacency 
matrix T. Note that cyclic graphs are allowed, like the right-most example 
which contains a bi-directional edge. Each action in A is connected to ob- 
servables s G O by dashed arrows encoding (which is assumed fixed in all 
three plots). The lower parts of the plots show the model matrix F in each 
of these three cases. Each cell in the matrix corresponds to the state of one 
observable s E O after performing an action a G A. Cells containing observed 
effects are colored black, those containing non-effects are white. The objective 
of structure learning in NEMs is to recover V and from noisy observations 
of effect patterns. 



The parameters V and uniquely determine the model. We therefore use 
P(D | T, 0) interchangeably with P(D \ F). Examples of nested effects models 
are given in Fig. [TJ showing the graphs T and as well as the resulting effects 
model F. 



Transitivity requirement Previous definitions of NEMs required the ac- 



tions graph to be transitively closed (Markowetz et al. , 2005, 2007), restricting 



model space from the space of all graphs to the space of transitively closed 
graphs. This is sensible if paths in the actions graph are interpreted as causal 
chains. For transitively closed graphs, our model has a property that moti- 
vated the name "nested effects model" : The existence of an edge a — > a' in a 
transitively closed graph implies that the effects observed for action a' (i.e. all 
effects s with 9(s) = a') are "nested" in the observed effects of a: 



& {sEO\ F a , s = 1} C {s G O I F as = 1}. 



(2.2) 



This correspondence induces an order homomorphism from V to the subset 
lattice of the set observations, which is satisfactory from a mathematical point 
of view. 

However, admitting only transitively closed graphs as valid models is a 



constraint which makes structure learning computationally hard (Markowetz 
et al. 2007). Even a small change in the model — like removing or adding an 



edge — can make many more changes necessary to preserve transitivity. The 
likelihood function will be quite volatile and the likelihood landscape will not 
be smooth. 

Our calculations here do not rely on the transitivity of the actions graph. 
Our definition of NEMs thus extends the one used in previous studies ( Markowej tz 
eFaLj [2005| [20071 |Frohhch et al.[ |2007a|b| . 



3 Inference of the actions graph 

Likelihood Assuming data independence, the likelihood of the model F 
(with data D fixed) factors into 



P(D\F) = J] P(D as | s = F as ) (3.1) 

(a,s)&M 

oc J] P(D as \s = F as ) (3.2) 

(a,s)eOxA 

or logP(D\F) = log P{D as \ s = F as ) + const, (3.3) 

(a,s)£AxO 

if we define P(s = x\a) = 0.5 for x G {0,1} and (a, s) G (A x O) \ M. 
The quantity log P(D\F) can be expressed in a convenient form: For an 
observable s G O and a perturbation a E A, let the log likelihood ratio 
R sa = log p|"° s | be known, and R = (R sa ) be the O x A matrix of ratios. 
If we let iV be the null matrix, i.e. the model predicting no effects at all, then 

io g P ( z>iF)-io gW ) 6 ^j°° P m0w 

Rsa if Pas 1 

if F,„ = 



E 

(a,s)eAxO 
aeA seO 

J2( F R)aa = tr(FR) , (3.4) 

a<=A 



with "tr" denoting the trace function of a quadratic matrix. This derivation 
of the likelihood applies to both general effects models and nested effects mod- 
els. In particular, in nested effects models Eq. (2.1) allows to represent the 



likelihood as 



\ogP{D\r,6) = tr(reR) + const. (3.5) 

The likelihood function depends on the data only via the likelihood ratios in 
R. This makes our approach very flexible: our method can handle as input 
data binary values, p-values, or any other arbitrary statistic as long as it 
can be converted to a likelihood ratio. Section |8] contains an outline of how 
this quantity can be estimated in a practical application to gene expression 
microarray data. 



Posterior We aim at maximizing the posterior of T and 0, 

PCr.ep) = ™y(r.e) 

oc P(D\T,Q) ■ P(T) ■ P(Q), (3.6) 

where we assume that the parameters T and G are independent and follow 
prior distributions P(r) amd -P(O), which are not necessarily uniform. 

Let Q = (Qsa) be an no x n^ matrix with entries Q sa = P(0(s) = a). We 
assume that the prior links each observation independently to an action, i.e. 

P(Q) = l[P(9(s)=a s ) or log P{6) = ^ Q saa (3.7) 
sec seo 

where a s is the particular value of 6(s) in 0. Consider the data D fixed and 
write L(T,8) = log P(D\T, 6) for the log-likelihood of the data, given the 
model. Then the posterior of the model (T,8) becomes 

\ogP(T,e\D) = L(T,6) + logP(r) + log P(0) + const (3.8) 



(3.6) 

The task is to find the MAP estimate for P(T, 6 | D), 

(f,0) = argmax(L(r,#) + logP(r) + logP(0)) (3.9) 

r,0 

We are particularly interested in finding the optimal actions graph T. Writing 

6 r = argmax (L(r, 0) + log P(9)) , (3.10) 
e 

L(F) = L(I\0r) = maxL(r,0) , (3.11) 

6 

this corresponds to finding 

f = argmax(max(L(r,#) +logP(0)) +logP(T)) 
r e 

= argmax(L(r) + logP(r)) (3.12) 



4 Model identifiability 



We present theorems showing that the maximum likelihood estimator recovers 
the true structure of the actions graph for sufficiently "good" data. All proofs 
are given in the appendix. 

Definition 3. Let some data be observed from the underlying true effects model 
F. Let R be the ratio matrix which has been derived from the data. We say 
that the data is consistent with F if the ratio matrix R has a positive entry 
Rsa (= favors an effect) whenever F has a positive entry F as (= predicts an 
effect) at the corresponding position. 

Theorem 1. If the data is consistent with the effects model F, then the max- 



imum likelihood estimate of (3.4) equals F, 



F = argmax P(D\G) = argmax tr(GR) . (4-1) 
g B g 

□ 

In the light of this theorem it is interesting to find out to what extent 
the actions graph V and the assignment are controlled by the nested effects 
model F = TO. The complete answer is given in Theorem [3] We precede it 
by a definition and a lemma. 

Definition 4. Let F be a nested effects model parametrized by (r,B). Let 

d\ — ► 0j2 — ••• — * — 

be a cycle in T, let n be the circular permutation 
7i = (ai d2...a n ). Let denote the b-th unit column vector of length no, and let 
S = Y2beA ebe, ^(b) t> e the permutation matrix corresponding to tt. We say that 
(T',0') = (TS" 1 , SQ) is a reversal of (T, 0) induced by it (see Fig. \^for an 
example). Two reversals are called disjoint if they are induced by disjoint cyclic 
permutations (i.e. each action is fixed by at least one of the permutations). 

Lemma 2. Let (V, 0') be a reversal of (T, 0) induced by the permutation 
tt = (ai, a 2 , a n ). Then (V, 0') is a valid parametrization of F = T0. 



Lemma [2] states that the two parametrizations (T, 0) and (T', 0') define 
the same nested effects model, thus we call them observationally equivalent. 

For an action a G A, the parents of a are the actions b G A such that b — > a. 
If two distinct actions a, b G A have the same parents, then they are clearly 
indistinguishable by any kind of interventional measurement. This is a general 
limitation, not only a limitation of our model. We propose collapsing these two 



actions in such (Markowetz et al. I 2007b. We exclude indistinguishable 



actions from our considerations and state: 
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Figure 2: A basic example of a reversal. Both plots show a NEM on A = 
{X, Y, Z} and twelve observables O depicted as dots. The actions graphs V 
and r" are both cyclic, but the two models differ in the direction of the cycle 
(clock-wise on the right, against the clock on the left). However, the two 
model matrices F and F' are identical, since the change from V to V can be 
compensated by simultaneously changing the assignment between actions and 
observables (indicated by the dashed arrows). 



Theorem 3. Let (r, G) and (F', 0') be the parameters of two nested effects 
models. Assume that no two distinct actions a,b G A have the same parents 
in F or in F'. Then (F, 0) and (V, 0') are observationally equivalent if and 
only if the tuples can be converted one into another by a sequence of disjoint 
reversals. □ 

Taken together, Theorems [T] and [3] state that under mild conditions, not 
only the true nested effects model F is identifiable for "sufficiently good" data 
(which in practical cases means for a sufficiently high number of replicate 
measurements), but also the underlying actions graph V and the assignment 9 
are unique up to reversals. 



5 Actions graph search 

The space of actions graphs is huge, it contains 2 nA( - nA ~^ elements (recall 
that the diagonal entries of the adjacency matrix equal 1 ). In order to search 



efficiently, we need a fast method for the evaluation of (3.6). The key 
observation is that small changes in T require only small changes in the optimal 
assignment 8, and these can be calculated very fast. 



Elementary moves Let F and T' be neighbours in (5 , i.e. F and F' differ 
exactly in one edge, from action j to action k say. An elementary move is 
defined as the insertion or the removal of one edge in an actions graph (note 



that if we chose the search space (25 to be the transitively closed graphs or the 
acyclic graphs, these moves would not be well defined). Let ej denote the j-th 
unit column vector of length no, and let g k be the k-th unit column vector 
of length n_4. By x T denote the transpose of a vector or a matrix x. Define 
Ejk = Cje[ as the matrix containing only zero entries except a 1 in row j, 
column k. Then 

r = T + e jk E jk , where e jk = 1 - 2E fc . (5.1) 

Optimization of the effects graph Let Gj S = ejgj , j G A, s G O . Then 
L(T,9)-L(f ) tr(TQR) 

seo seo 

^2tr(Te e(s) ■ gjR) = ^ tr(T. g{s) R s .) = ^2tr(R s .T. e{s) ) 

seo seo seo 

^i? s .r. e(s) = ^(i?r) se(s) (5.2) 

seo seo 



It follows from the equations (3. 7), (3. 10) and (5.2) that maximizing 9 with 



respect to L(T, 9) can be done pointwise, i.e. 



9 r (s) = argmax(( J Rr) sa + Q sa ) , s G O. (5.3) 
aeA 



For each s G O, step (5.3) takes 0(no) time, provided that the matrix RT 
is given. It is therefore necessary to keep track of this matrix whenever T is 
changed into a T'. But RT' = RT + tj k REj k is obtained from RT simply by 
adding ej k R-j to the k-th column of RT, so this process takes only 0(nX) time. 



The complete evaluation of 9 according to (5.3) takes O(n^no) time. However 
we can exploit the fact that (in expectation) hardly any of the observable 
effects has to be reassigned. For the moment, fix s G O and consider the 
vector v = (RT) S . + Q s . and let 

w = (RT') s . + Q s . = (R(T + e jk E jk )) s , + Q s . 
= ((RT) s . + e jk (R ej el) s ) + Q s . 

= v + e jk (e^Rej)el = v + e jk R sj el . (5.4) 



By (5.3), t — 9r(s) = argmax a6 _4 v a and 6*r'(s) = argmax ag _4 w a . The vector v 



differs from w at most in its k-th entry. The following cases can occur: 



Ms) 



t 
k 
k 

argmax w a 

t 
k 
k 

argmax w a 
aeA 



, w k < v t 
, w k > v t 



lit ^k 
if t ± k 

if t = k , Wk > Vk 
if t = k , Wk < Vk 

if t k , v k + e jk R S j 
if t ^ k , v k + e jk R S j 
ift — k 
iit — k 



< v t 
> v t 



e jkRsj > o 



(5.5) 



-3k 



R sj < 



Given the matrix RT, the first three cases in (5.5 ) can be calculated in constant 



time. The fourth case requires 0(n_4) time. The elementary moves choose 
every edge j — > k with the same frequency, so the expected relative frequency 
for which the case t = k occurs is — . Therefore, the expected running time 

O(l). We have to do this step for 



for (5.5) is at most O 



■ 1 + 



all s G O, so the calculation of the function 8 can be done in expected O(no) 
time. What remains to do is to update the matrix RT to 



RV = RT + ejkRE 



jk 



RT + €jkR-j€ k 



This only affects the k-th column of RT, to which we add the vector e^R-j- 
The time consumption of this step is O(no). 



Gray code enumeration of actions graphs Actions graphs are treated as 
binary vectors of length n\ — (the diagonal is fixed), and they are enumer- 



ated without redundances using a gray code (Knuth, 2005). Each enumeration 



step alters exactly one edge of the predecessor graph, so we can take advantage 
of our fast update algorithm. It allows the exhaustive search of the actions 
graph space for n < 5 (computation time on a 1GHz computer: a few 
seconds for n = 4 actions, approx. lOmins for n = 5 actions). 



6 Extensions 

In this section, we adapt the raw nested effects model to make it more applica- 
ble to real-life data sets. We discuss methods to incorporate prior knowledge 
on parts of an action graph and to decrease measurement noise by feature 
selection and regularization. 



6.1 Rigid actions graph prior 

In many practical applications, parts of the true actions graph structure is 
already known, and only a fraction of edges has to be estimated from the data. 
Taking advantage of this, we introduce a rigid prior on the actions graph. An 
edge can be declared as known present, known absent, or unknown. Exhaustive 
search is then performed only on those edges whose presence is unknown. 

This permits a novel way of joining new components to a well known signal- 
ing network: Given measurements of a known actions graph and an additional 
action node a, declare only edges starting or ending in a as unknown. The 
reconstruction procedure will then find the position of a within the already es- 
tablished network. We show the feasibility of this procedure in the simulations 
in section 17.21 

6.2 Feature selection and regularization 

In high-dimensional phenotypic readouts, we may encounter a situation in 
which a considerable part of all observables does not react to any intervention 
at all. The occurrence of many false positive effects is an inevitable conse- 
quence. Therefore, it is essential to only include responsive observables into 
the model and discard the rest. 

The null action Our model offers an elegant way of doing feature selection: 
Extend the adjacency matrix V of the actions graph by one null column, which 
can be interpreted as an action that does not affect the observations assigned 
to it (we call it the null action in contrast to the regular actions in A ). The 
optimization procedure in Section [5] then assigns a gene to the null action if 
considering the gene a general non-responder is beneficial to the posterior. 

This method has two advantages: It does hardly cost any extra computa- 
tion time, and the number of responsive genes does not have to be fixed in 
advance. For example a best fitting graph structure might recruit many weakly 
responsive genes, whereas in other situations it might receive less numerous 
but strong support by only a few genes. 

Regularization We complement the null action with a noise reducing regu- 
larization step. A straightforward way is to subtract a (non-negative) constant 
5 from each entry in the ratio matrix R. This amounts to a priori favoring 
non-effects, since 



R, 



— 5 = log 




(6.1) 



■sa 



P(D as | s 



0) ■ exp(5) ' 



Suppose that all values R sa — 5, a G A in some row of R are negative. Then 
any assignment of the observable s to a regular action will decrease the pos- 
terior. Thus, in any model, s will be optimally assigned to the null action. 
It is therefore time-saving to directly exclude this effect before entering the 
reconstruction algorithm. 

The PpO-score We propose a simple heuristic for the optimal choice of 5 
which works well in practice. Let 0(5) be the effects that are still included 
into the reconstruction step after the regularization by 5 has been applied. 
The larger 5, the smaller 0(5), and 5 = corresponds to no pre-selection 
at all. Let (T^, Q$) be the maximum a posteriori estimate derived from the 
(5-)regularized ratio matrix R. Define the posterior per observable score by 

The PpO measures the average contribution of each effect in 0(6) to the log 
posterior value of the best scoring model. Select the optimal value of 5 as 

5 = argmaxPpO(<5). (6.3) 

6 

Figure [5] shows a typical PpO curve, which compares models with varying 
degrees of regularization for the Drosophila data set we will describe in detail 
in section [Sj 



7 Simulation results 

7.1 Robustness of the actions graph reconstruction 

Section [4] proved the identifiability of nested effects models under the assump- 
tion of consistent data. Here we investigate the robustness of NEMs against 
measurement errors, i.e. variability in the ratio matrix. 

Given a "true" NEM and a noise level a, we calculate a consistent ratio 
matrix containing the entry 0.5 (resp. —0.5) whenever the model predicts 
an effect (resp. no effect). Then, we add independent, normally W(0, a 2 )- 
distributed noise to each entry of the ratio matrix. 

An exhaustive search on the actions graph space produces a distribution 
of posterior scores as well as the highest scoring NEM. From this distribution, 
we compute the rank of the score of the original NEM among all scores as well 
as the number of true positive and false positive edges in the highest scoring 
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Figure 3: Reliability of the actions graph reconstruction in the presence of 
noise. The noise level was varied from to 1 in steps of 0.1. For each level of 
noise, the respective plot shows the statistics over 100 sample NEMs containing 
5 edges. Left: The number of true positive edges in the highest scoring NEM. 
Middle: The number of false positive edges in the highest scoring NEM. 
Right: The (distribution of the) rank of the posterior score of the true model 
among all posterior scores. 

NEM. Results are averaged over 100 randomly sampled NEMs for each level 
of noise. 

The results displayed in Figure [3] show the reliability of reconstructing the 
actions graph under increasing noise. Only at noise levels above 0.5 does our 
model start to miss edges (left plot) or to include spurious edges (middle plot) 
and the correct graph may then no longer be the highest scoring (right plot). 
For noise levels below 0.5 we achieve perfect reconstruction in all simulation 
runs. 

7.2 Utility of prior knowledge 

We test the impact of prior knowledge on the quality of the actions graph 
reconstruction in two ways. First, a fixed "true" model consisting of 4 actions, 
50 observables and 5 edges is constructed, and a noisy ratio matrix is generated 
from it. The noise level is set to a = 0.7. 

Starting from this matrix, a series of exhaustive searches is carried out. 
Each time, a prior is generated that either fixes a number of truly present 
edges as present, or which specifies a number of truly absent edges as absent. 
The quality of reconstruction is assessed in terms of sensitivity and specificity 
(regarding only those edges that were not known a priori). Since the quality of 



Prior containing present edges 



Prior containing absent edges 



Expansion of the network 




Figure 4: Effects of prior knowledge. Left: Keep a fixed model, and increase 
the number of known present edges (0, ...,4). Middle: Keep a fixed model, 
and increase the number of known absent edges (0, ...,6). Right: Start with 
an unknown graph of 4 actions, and add new actions (0, ...,6) as well as their 
adjacent edges to the graph. In all three plots, the error bars range from the 
first to the third quartile of the distributions obtained in 100 simulation runs. 

reconstruction heavily depends on the true actions graph topology, we average 
the results over 100 sample runs of this procedure. 

The left and middle plot of Fig. [4] show the results of this procedure. The 
left plot illustrates the reconstruction quality in dependence of the number of 
a priori known present edges. The middle plot does the same for the inclusion 
of prior knowledge about absent edges. Both plots show that including prior 
information considerably increases sensitivity and specificity. In particular, 
information about present edges helps more than information about missing 
edges. 

In a second experiment, a fixed "true" model of 10 actions and 15 edges 
is created, and a noise ratio matrix is generated from it. We randomly pick 
a subgraph of 4 actions, the structure of which is assumed to be completely 
unknown. Another k nodes (k = 0, ...,6) are added to the subgraph, and all 
edges not belonging to the initial subgraph are correctly specified as known 
present /absent via the actions graph prior. For each k, we restrict the original 
ratio matrix to the nodes present in the (k + 4)-nodes subgraph and start an 
exhaustive search. 

Again, the quality of reconstruction is reported by sensitivity and specificity 
averaged over 100 sample runs. The noise level was set to 0.4, and the number 
of observables was set to 200. The results in the right plot of Fig. [4] show a 
strong increase in sensitivity at the cost of a slight decrease in specificity. 



8 Application to Drosophila immune response 



We apply our methodology to data from an RNA interference (RNAi) gene si- 



lencing study on innate immune response in Drosophila melanogaster (Boutros 



et al. , 2002 ). The experiment probes how transcriptional response to lipopolysac- 



charides (LPS) is regulated by signal transduction pathways in the cell. 



Data The data set consists of 16 Affymetrix microarrays: 4 replicates of 
control experiments without LPS and without RNAi (negative controls), 4 
replicates of expression profiling after stimulation with LPS but without RNAi 
(positive controls), and 2 replicates each of expression profiling after applying 
LPS and silencing one of the four candidate genes tak, key, rel, and mkk^/hep. 

Selectively removing one of these signaling components blocks induction of 
all, or only parts, of the transcriptional response to LPS. Boutros et al. (2002) 
show that this observation can be explained by a fork in a signaling pathway 
below tak, with key and rel on the one side and rakkJ^/hep on the other. This 
result clarified the contributions of different pathways to immune response in 
Drosophila (Royet et al. , 2005). 



Previous analyses The experimental design of this study, which includes 
both negative and postive controls, allows to define informative effects of inter- 
ventions and quantify the false positive and false negative rates. In the original 



analysis (Boutros et al. 2002) and two subsequent studies (Markowetz et al. 



2005, 2007) only the 68 genes differentially expressed between positive and 



negative controls were used as effect reporters. Markowetz et al. (2005) pro- 
pose a simple discretization scheme based on the two controls: if by silencing a 
gene in the LPS stimulated cell the expression of an LPS-inducible gene moved 
close to its expression in the negative controls, this was counted as an effect 
of the intervention; if a gene's expression stayed close to its expression in the 
positive controls, the gene was counted as being not affected by the interven- 
tion. Applying the same discretization scheme to the positive and negative 
controls makes it possible to estimate the two error rates. 



Analysis based on a single control The two types of controls can be 
used to define a set of informative effect reporters and assess the error rates 
in the data. However, most experimental studies do not contain two kinds 
of controls but only one. To mimic this situation we will make no use of 
the negative controls in the dataset and only include the four LPS-induced 
measurements in our analysis. We show in the following that our improved 
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Figure 5: Application of automatic regularization to the Drosophila data 
set. Each column corresponds to one value of 5 and a selected number of 
observables between 30 and 500. For each value of 5 we plot (1.) the PpO 
score (black bar with scale on the right) and (2.) the number of edges in the 
inferred model (gray bar with scale on the left). The dark grey bar at 143 
observables indicates the optimal degree of regularization. The corresponding 
model is discussed in Fig. |6l 



methodology is still applicable and exploits the information in the data better 
than previous approaches. 

Calculation of the ratio matrix R We use well established methods to 
assess differential gene expression between the positive controls (LPS stim- 
ulation but no gene silencing) and the gene perturbation profiles. Because 
of the small number of samples we chose a highly regularized empirical Bayes 



method for assessing differential expression in microarray experiments (Smyth 



2004), which is implemented in the R-package limma (Smyth, 2005) available 



from www.bioconductor.org. The empirical Bayes approach is equivalent to 
shrinkage of the estimated sample variances towards a pooled estimate, re- 
sulting in far more stable inference when the number of arrays is small. We 
compute likelihood ratios for the comparison of positive controls against every 
gene perturbation. We then select genes which show a positive ratio (regardless 
of its size) for at least two of the four knock-downs. This simple step of deleting 
uninformative genes reduces the number of effect reporters (observables) from 
14 010 to 904. This number is still much bigger than the number of differential 
genes used in previous analyses and makes feature selection necessary. 
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Figure 6: Results on Drosophila data. The upper graph represents T on 
A = {key-, tak-, rel-, mkk4/hep-}, while the assignment is shown as grey 
lines connecting nodes in F with observables. The matrix below shows the 
ratio matrix R (where each column is one observable) with darker values of 
grey indicating higher likelihood rations (see the colorbar on the right). The 
graph T places tak above all other nodes and shows a branch below tak with 
key and rel on one side and mkk^/hep on the other side. The double headed 
arrow between key and rel shows that the model can not distinguish between 
them (see the nearly identical rows in the ratio matrix). 



Results We fit NEM models to the ratio matrix R using the feature selection 
mechanism described in section |6.2| The resulting curve of the PpO statistic 



is shown in Fig [5} The model selected in our automatic procedure includes 
143 observables (out of 904) and is shown in Fig. |6} 

Our model places tak above all other nodes and shows a branch below 
tak with key and rel on one side and mkkJ^/hep on the other side. The gene 
perturbations key and rel remain undistinguishable due to almost identical 
phenotypic profiles (see the nearly identical rows in the ratio matrix in Fig. [6]). 
The branching below tak into two sub-pathways is the main biological feature 



of the data (Boutros et al., 2002) and our model succeeds in recapitulating it. 



A previous analysis of the same data set (Markowetz et al. , 2005) showed 
a very similar picture but included one additional edge from tak to rel, which 



is not contained in our result. It is known that rel is a transcription factor 



responsible for immune response, which is activated via the kinase key (Royet 



et al. 2005). Thus, the additional edge from tak to rel was a spurious result. 



It can be explained by the fact that the NEMs used in (Markowetz et al. 



2005 ) were constrained to transitively closed graphs (and then the direct edge 



from tak to rel is needed because there is a path from tak to rel over key). 
This shows that our general formulation of NEMs, which is not constrained 
to transitively closed graphs, can yield results closer to biological reality than 
previous formulations. 



9 Discussion 



In this paper we introduced a generalized definition of Nested Effects Models 
and expanded their statistical basis in several important directions. Our most 
important theoretical result is that NEMs can be shown to be identifiable 
under mild conditions on the data. 



General NEMs The new general formulation of NEMs expands the model 
class from transitively closed graphs to all directed graphs. This reduces the 
bias in the model and leads to results closer to existing biological knowledge 
in the application to Drosophila immune response. 



Likelihood formulation The new likelihood equation is much more flexible 



than previous equations for binary data (Markowetz et al. 2005, 2007). It is 



applicable to any kind of data by converting it into likelihood ratios for the 
comparison of effects and non-effects. Thus, it can even integrate heteroge- 
neous sources of data as long as they can be translated into likelihood ratios. 
Additionally, missing data or the exclusion of bad measurements is possible 
without changes in the algorithm. 



Model search Our formulation of the likelihood also leads to a fast updating 
procedure which can be carried out in linear time and is exceedingly faster than 
previous approaches. Still, an exhaustive search is clearly infeasible for larger 
values of perturbed genes. However, the fast elementary moves introduced here 
allow the application of combinatorial search algorithms, like Markov Chain 



Monte Carlo (Gilks et al. 1996) or simulated annealing (Kirkpatrick et al. 



1983), to find high scoring models. 



Prior knowledge We showed the usefulness of incorporating prior knowl- 
edge into model search by fixing parts of a bigger model and only inferring 
the unknown part. This is a special case of a prior distribution on the space 
of model graphs. We hope to extend this approach to more flexible structure 
priors. One promising research direction could be to use a structure prior that 
favors transitively closed graphs. In this way it would be possible to find a 
balance between the less biased models introduced here and the causally in- 



terpretable but more constrained models introduced earlier (Markowetz et al. 



2005, 2007). 



Availability of software The NEM exhaustive search algorithm and all its 
extensions described in this paper are implemented, documented and ready 
to use in the R package Nessy, which is available at www.bioconductor.org. 
It includes a plotting routine that conveniently displays nested effects models 
(see, e.g., Figg- 

Appendix: Proofs of Theorems 



Theorem [T] If the data is consistent with the effects model F , then the 



max- 



imum likelihood estimate of (3.4) equals F, 



F = argmax P(D\G) = argmax tr(GR) 
g S g 



Proof. We have tr(GR) = J2 ae A E sG o G asR sa - 

If F as = 1, then by consistency of the data R sa > and the choice G as = 
1 = F as maximizes the summand G as R sa . 

If F as = 0, then R sa < and the coice G as = = F as maximizes the 
summand G as R sa - Hence argmax tr(GR) = F. □ 

G 

Lemma [2] Let (T', ©') be a reversal of (T, G) induced by the permutation 
7r = (ai, a 2 , a n ). Then (T' , 6') is a valid parametrization of F = TQ . 

Proof. Let e a denote the a-th unit column vector of length no- Clearly, T'6' = 
(rS' _1 )(S'6) = T0 = F. The only additional requirement we need to check is 



Y' aa = 1 for all a E A. This holds because of 

Ka = eaiTS-^e* = e a T^el {h) e b e T a = e a Tel {a) 

beA 



J- aTr(a) 



if a G {ai, a n } 
r„„ otherwise 



□ 



Theorem [3] Lei (r, 0) and (V, 0') fre i/ie parameters of two nested effects 
models. Assume that no two distinct actions a,b G A have the same parents 
in T or in T' . Then (T, 0) and (T', ©') are observationally equivalent if and 
only if the tuples can be converted one into another by a sequence of disjoint 
reversals. 

Proof. " <=" : This follows immediately from Lemma [2j 

"=^": For an action a G A, denote the parents of a in T by pa r (a) = {6 G 
p 

^4. | b — > a}. Recall that for an actions graph r, a G pa r (a) for all a E A. The 
set of observables attached to a via the effects graph is called the children 
of a in 0, ch e (a) = {s G O \ a — > s}. Since pa r (a) determines r. a (and vice 
versa), and che(a) determines a . (and vice versa), the family of all parents 
sets determines T and the family of all children sets determines 0. 

0' 

Assume that che(a) and che'(o) intersect nontrivially, say a — > s, b — >• s. Then 

r. a = r. e(s) = (re).. = (r'e'). s = r: e , (s) = r' fc (9.1) 

Hence 

pa r (a) = pa r /(6) (9.2) 



Furthermore, let t G che(a) and t G che'(c). Then by (9.2), pa r (a) = pa r /(c) 



which in turn together with (9.2) implies pa r /(6) = pa r /(c). By the hypothesis, 
this is only possible if b = c. Thus che(a) C che'(fe), and for symmetric reasons, 
che(a) = che'(o). It follows that the partitions 

|Jch (a) = O= |Jche'(a) (9-3) 

aeA aeA 

are identical up to order. Therefore there exists a permutation tt of A such 
that 



che/(a) = che(7r(a)) , a G A 



(9.4) 



Together with (9.2) this implies 

pa r ;(a) = pa r (7r(a)) , a 6 A (9.5) 

In other words, (T', ©') is determined completely by (T, 9) and tt. Let us 
investigate 7r a little closer. Let T = J2beA e&e ^(b) ^ e ^ ne permutation matrix 
associated with n. We confirm that 



{TQ) a . = el E e ^) & = e ^) = -W- " & » > a e ^ > ( 9 - 6 ) 



J9.4|) 

so 0' = TO. By analogous calculations, T' = TT' 1 . Now for a G A, 



i = Ka ^ r an(a) , (9.7) 

which means that there exists an edge from a to n(a) in T. For any a £ A, T 
must therefore contain the cycle a — > tt(o) — > vr 2 (a) — ► ... — ► a. 

Let 7r = 7Ti7r 2 • ... ■ 7T m be a decomposition of 7r into disjoint cycles, and let 
Ti, T 2 , T m be the n^xn^ permutation matrices corresponding to m, 7r 2 , 7i m 
respectively. Clearly, T = TiT 2 • ... • T m . 

Define (r , ©o) = (r,0), and inductively (Tj,Qj) = (Tj-xT^ , T,-0j_i), 
j = l,...,m. Then (T m , m ) = (FT -1 , TO) = (r',0'), and we have con- 
structed a sequence of disjoint reversals converting (T, O) into (r', ©'). □ 
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